Old Content and Modern Tools - Searching Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910

نویسندگان

  • Kimmo Kettunen
  • Eetu Mäkelä
  • Teemu Ruokolainen
  • Juha Kuokkala
  • Laura Löfberg
چکیده

Kimmo Kettunen, Eetu Mäkelä, Teemu Ruokolainen, Juha Kuokkala and Laura Löfberg 1 National Library of Finland, Centre for Preservation and Digitization, Mikkeli, Finland [email protected] 2 Aalto University, Semantic Computing Research Group, Espoo, Finland [email protected] 3 National Library of Finland, Centre for Preservation and Digitization, Mikkeli, Finland [email protected] 4 University of Helsinki, Department of Modern Languages, Helsinki, Finland [email protected] 5 Department of Linguistics and English Language, Lancaster University, UK [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modern Tools for Old Content - in Search of Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910

Named entity recognition (NER), search, classification and tagging of names and name like frequent informational elements in texts, has become a standard information extraction procedure for textual data. NER has been applied to many types of texts and different types of entities: newspapers, fiction, historical records, persons, locations, chemical compounds, protein families, animals etc. In ...

متن کامل

Tagging Named Entities in 19th Century and Modern Finnish Newspaper Material with a Finnish Semantic Tagger

Named Entity Recognition (NER), search, classification and tagging of names and name like informational elements in texts, has become a standard information extraction procedure for textual data during the last two decades. NER has been applied to many types of texts and different types of entities: newspapers, fiction, historical records, persons, locations, chemical compounds, protein familie...

متن کامل

Measuring Lexical Quality of a Historical Finnish Newspaper Collection ― Analysis of Garbled OCR Data with Basic Language Technology Tools and Means

The National Library of Finland has digitized a large proportion of the historical newspapers published in Finland between 1771 and 1910 (Bremer-Laamanen 2001). This collection contains approximately 1.95 million pages in Finnish and Swedish. Finnish part of the collection consists of about 2.39 billion words. The National Library’s Digital Collections are offered via the digi.kansalliskirjasto...

متن کامل

How to do lexical quality estimation of a large OCRed historical Finnish newspaper collection with scarce resources

Digitization of both hand-written and printed historical material during the last 10–15 years has been an ongoing academic and non-academic industry. Most probably this activity will only increase in the current Digital Humanities era. As a result of past and current work we have lots of digital historical document collections available and will have more of them in the future. The National Lib...

متن کامل

Keep, Change or Delete? Setting up a Low Resource OCR Post-correction Framework for a Digitized Old Finnish Newspaper Collection

There has been a huge interest in digitization of both hand-written and printed historical material in the last 10–15 years and most probably this interest will only increase in the ongoing Digital Humanities era. As a result of the interest we have lots of digital historical document collections available and will have more of them in the future. The National Library of Finland has digitized a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Digital Humanities Quarterly

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2017